Improving Full-Text Precision on Short Queries using Simple Constraints

نویسنده

  • Marti A. Hearst
چکیده

We show that two simple constraints, when applied to short user queries (on the order of 5{10 words) can yield precision scores comparable to or better than those achieved using long queries (50{85 words) at low document cuto levels. These constraints are meant to detect documents that have subtopic passages that includes the most important components of the query. The constraints are: (i) a simple Boolean constraint which requires the user to specify the query as a list of topics; this list is converted into a conjunct of disjuncts by the system, and (ii) a subtopic-sized proximity constraint imposed over the Boolean constraint. The vector space model is used to rank the documents that satisfy both constraints. Experiments run over 45 TREC queries show signi cant, almost consistent improvements over rankings that use no constraints. These results have important rami cations for interactive systems intended for casual users, such as those searching on the World Wide Web.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Information Technology: A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search

OBJECTIVES Study comparatively (1) concept-based search, using documents pre-indexed by a conceptual hierarchy; (2) context-sensitive search, using structured, labeled documents; and (3) traditional full-text search. Hypotheses were: (1) more contexts lead to better retrieval accuracy; and (2) adding concept-based search to the other searches would improve upon their baseline performances. DE...

متن کامل

Fuzzy Full-Text Searches in OCR Databases

Though the quality of optical character recognition software is steadily improving, it is still far from being perfect. As a result, full-text databases that are lled by means of OCR software contain many errors. These errors have to be taken into consideration if such kind of databases are examined by means of full-text searches. In this chapter, we will illustrate some of the possible methods...

متن کامل

Multimodal Medical Image Retrieval: Improving Precision at ImageCLEF 2009

We present results from Oregon Health & Science University’s participation in the medical retrieval task of ImageCLEF 2009. This year, we focused on improving retrieval performance, especially early precision, in the task of solving medical multimodal queries. These queries contain visual data, given as a set of image-examples, and textual data, provided as a set of words belonging to three dim...

متن کامل

A Hybrid Information Retrieval Model Using Metadata and Text

Information retrieval (IR) with metadata tends to have high precision as long as the user expresses the information need accurately but may suffer from low recall because queries are too exact with the specification of the metadata fields. On the other hand, full-text retrieval tends to suffer more from low precision especially when queries are simple and the number of documents is large. While...

متن کامل

Improving Search and Retrieval Performance through Shortening Documents, Detecting Garbage, and Throwing Out Jargon

This thesis describes the development of a new search and retrieval system used to index and process queries for several different data sets of documents. This thesis also describes my work with the TREC Legal data set, in particular, the new algorithms I designed to improve recall and precision rates in the legal domain. I have applied novel normalization techniques that are designed to slight...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996